Search CORE

3 research outputs found

Synthetic voices in the foreign language context

Author: Bione Tiago
Cardoso Walcir
Publication venue: (co-sponsored by Center for Open Educational Resources and Language Learning, University of Texas at Austin)
Publication date: 01/02/2020
Field of study

This study evaluated the voice of a modern English text-to-speech (TTS) system in an English as a foreign language (EFL) context in terms of its speech quality, ability to be understood by L2 users, and potential for focus on specific language forms. Twenty-nine Brazilian EFL learners listened to stories and sentences, produced by a TTS voice and a human voice, and rated them on a 6-point Likert scale according to holistic criteria for evaluating pronunciation: Comprehensibility, naturalness, and accuracy. In addition, they were asked to answer a set of comprehension questions (to assess understanding), to complete a dictation/transcription task to measure intelligibility, and to identify whether the target past -ed form was present or not in decontextualized sentences. Results indicate that the performance of both the TTS and human voices were perceived similarly in terms of comprehensibility, while ratings for naturalness were unfavorable for the synthesized voice. For text comprehension, dictation, and aural identification tasks, participants performed relatively similarly in response to both voices. These findings suggest that TTS systems have the potential to be used as pedagogical tools for L2 learning, particularly in EFL settings, where natural occurrence of the target language is limited or non-existent

ScholarSpace at University of Hawai'i at Manoa

Synthetic voices in the foreign language context

Author: Bione Alves Tiago
Publication venue
Publication date: 12/09/2017
Field of study

Second language (L2) researchers and practitioners have explored the pedagogical capabilities of text-to-speech synthesizers (TTS) for their potential to enhance the acquisition of writing (Kirstein, 2006), vocabulary and reading (Proctor, Dalton, & Grisham, 2007), and pronunciation (Cardoso, Collins, & White, 2012; Liakin, Cardoso, & Liakina, 2017; Soler-Urzua, 2011). Despite the positive evidence to support the use of TTS as a learning tool, the applications need to be formally evaluated for their potential to promote the conditions under which languages are acquired, particularly in an English as a foreign language (EFL) environment, as suggested by Cardoso, Smith, and Garcia Fuentes (2015). The current study evaluated the voice of a modern English TTS system—used in an EFL context in Brazil—in terms of its speech quality, ability to be understood by L2 users, and potential for focus on specific language forms, and was operationalized based on the following criteria: (1) users’ ratings of holistic features (comprehensibility, naturalness, and accuracy, as defined by Derwing & Munro, 2005); (2) intelligibility (the extent to which a message is actually understood), measured with a dictation task; (3) text comprehension (i.e., users’ ability to understand a text and answer comprehension questions); and (4) users’ ability to hear a specific morpho-phonological feature (i.e., the aural identification of English past tense -ed.) Twenty-nine Brazilian EFL learners listened to stories and sentences, produced alternately by a TTS voice and a human, and rated them on a 6-point Likert scale according to the abovementioned holistic criteria (comprehensibility, naturalness, and accuracy). In addition, they were asked to answer a set of comprehension questions to assess their ability to understand what they had heard. To measure intelligibility, participants completed a dictation task in which they were asked to transcribe utterances, as recommended by Derwing and Munro (2005). Finally, participants performed an aural identification of 16 sentences to judge whether the target feature (past mark -ed) was present or not. After these tasks were completed, semi-structured interviews were conducted to collect data regarding participants’ perceptions of the technology. Results indicate that the performance of both the TTS and human voices were perceived similarly in terms of comprehensibility, while ratings for naturalness were unfavorable for the TTS voice. In addition, participants performed relatively similarly in response to both voices with respect to the tasks involving text comprehension, dictation, and identifying a target linguistic form (past -ed) in aural input. These findings suggest that TTS systems have the potential to be used as pedagogical tools for L2 learning, particularly in an EFL setting where natural occurrence of the target language is limited or non-existent

Concordia University Research Repository